Lazy Classifiers Using P-trees

نویسندگان

  • William Perrizo
  • Qin Ding
  • Anne M. Denton
چکیده

Lazy classifiers store all of the training samples and do not build a classifier until a new sample needs to be classified. It differs from eager classifiers, such as decision tree induction, which build a general model (such as a decision tree) before receiving new samples. K-nearest neighbor (KNN) classification is a typical lazy classifier. Given a set of training data, a knearest neighbor classifier predicts the class value for an unknown tuple X by searching the training set for the k nearest neighbors to X and then assigning to X the most common class among its k nearest neighbors. Lazy classifiers are faster at training time than eager classifiers, but slower at predicating time since all computation is delayed to that time. In this paper, we introduce approaches to efficient construction of lazy classifiers, using a data structure, Peano Count Tree (P-tree). P-tree is a lossless and compressed representation of the original data that records the count information to facilitate efficient data mining. With P-tree structure, we introduced two classifiers, P-tree based k-nearest neighbor classifier (PKNN), and Podium Incremental Neighbor Evaluator (PINE). Performance analysis shows that our algorithms outperform classical KNN methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Classifiers in Multimodal Affect Detection

Affect detection where users’ mental states are automatically recognized from facial expressions, speech, physiology and other modalities, requires accurate machine learning and classification techniques. This paper investigates how combined classifiers, and their base classifiers, can be used in affect detection using features from facial video and multichannel physiology. The base classifiers...

متن کامل

Semi-Lazy Learning: Combining Clustering and Classifiers to Build More Accurate Models

Eager learners such as neural networks, decision trees, and naïve Bayes classifiers construct a single model from the training data before observing any test set instances. In contrast, lazy learners such as Knearest neighbor consider a test set instance before they generalize beyond the training data. This allows making predictions from only a specific selection of instances most similar to th...

متن کامل

k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a ...

متن کامل

Pruning Techniques in Associative Classification: Survey and Comparison

Association rule discovery and classification are common data mining tasks. Integrating association rule and classification also known as associative classification is a promising approach that derives classifiers highly competitive with regards to accuracy to that of traditional classification approaches such as rule induction and decision trees. However, the size of the classifiers generated ...

متن کامل

UTD-HLT-CG: Semantic Architecture for Metonymy Resolution and Classification of Nominal Relations

In this paper we present a semantic architecture that was employed for processing two different SemEval 2007 tasks: Task 4 (Classification of Semantic Relations between Nominals) and Task 8 (Metonymy Resolution). The architecture uses multiple forms of syntactic, lexical, and semantic information to inform a classification-based approach that generates a different model for each machine learnin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002